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This  study  reports  the  analyses  of  data  collected  from  an  evaluation  effort  for  2  Mis¬ 
sion  Ready  Technician  (MRT)  training  programs  for  C-141  transport  and  F-16 
fighter  aircraft  crew  chiefs.  We  obtained  ratings  from  over  100  trainees  in  each  pro¬ 
gram,  as  well  as  from  their  trainers  and  supervisors,  both  during  training  and  in  the 
field  via  survey.  The  goal  of  this  research  was  to  explore  the  criterion  space  set  up  for 
this  evaluation.  Whereas  past  evaluation  research  has  explored  task  difficulty,  fre¬ 
quency,  and  importance,  this  research  explores  an  expanded  criterion  space,  includ¬ 
ing  task  confidence,  task  performance,  task  difficulty,  and  task  frequency.  Descrip- 
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live  statistics,  predictive  regressions,  and  exploratory  factor  analyses  are  reported. 
We  conclude  that  the  data  show  a  similar  factor  structure  for  both  aircraft  and  that 
MRT  frequency  of  task  performance  and  confidence  ratings  are  highly  predictive  of 
field  performance.  A  major  implication  is  that  one  way  to  optimize  the  effectiveness 
of  training  is  to  emphasize  the  development  of  trainee  confidence  at  a  relatively  mi¬ 
cro  level,  such  as  the  task  level. 


TRAINING  EVALUATION 

Training  evaluation  is  the  programmatic  process  whereby  the  outcomes  of  training 
are  tracked  and  analyzed.  Data  provided  by  the  American  Society  for  Training  and 
Development  (Bassi,  Cheney,  &  VanBuren,  1997)  affirmed  that  over  90%  of  sur¬ 
veyed  private  organizations  evaluate  training  in  some  fashion.  Bassi  et  al.  reported 
that  demonstrating  training  outcomes  is  one  of  the  top  10  trends  in  human  re¬ 
sources  and  will  continue  to  be  in  the  top  10  for  at  least  3  more  years.  If  history  is 
any  guide,  training  evaluation  will  be  a  core  concern  for  most  organizations  far  into 
the  future. 

Given  the  perennial  importance  of  training  evaluation,  it  is  not  surprising  that 
researchers  are  actively  involved  in  efforts  to  understand  how  to  improve  our  abil¬ 
ity  to  evaluate  effectively.  One  area  of  research  has  focused  on  the  nature  of  train¬ 
ing  criteria.  A  substantial  amount  of  research  on  this  topic  has  been  published. 
Some  research  has  examined  the  relations  among  training  criteria  (e.g.,  Alliger  & 
Janak,  1989;  Alliger,  Tannenbaum,  Bennett,  Traver,  &  Shetland,  1997).  Other  re¬ 
search  has  focused  on  drawing  together  the  “big  picture”  about  what  criterion  out¬ 
comes  of  hundreds  of  past  training  evaluation  studies  tell  us  (e.g.,  W.  Bennett, 
1995).  Still  other  researchers  have  explored  the  possibility  of  new  conceptualiza¬ 
tions  of  training  criteria  (e.g.,  Kraiger,  Ford,  &  Salas,  1993). 

Perhaps  surprisingly,  much  of  the  language  of  training  evaluation  criteria  is  still 
driven  by  the  simple  taxonomy  proposed  by  Kirkpatrick  in  1959  (Kirkpatrick, 
1959a,  1959b,  1960a,  1960b).  Although  there  is  some  discussion  as  to  whether  this 
taxonomy  should  be  augmented  (e.g.,  Alliger  et  al.,  1997)  or  discarded  in  favor  of 
new  taxonomies  (e.g.,  Kraiger  et  al.,  1993),  it  seems  indisputable  that  we  need  to 
continue  to  look  closely  at  real-world  training  criteria  in  order  both  to  understand 
how  criteria  are  related  to  one  another  and  to  inform  our  choice  of  measures. 

Accordingly,  this  article  attempts  to  draw  together  empirically  based  lessons 
about  training  criteria  as  collected  in  a  related  series  of  training  evaluation  efforts. 
Specifically,  we  report  analyses  relevant  to  current  research  on  training  criteria 
drawn  from  our  experiences  with  evaluation  of  two  of  the  United  States  Air  Force’s 
Mission  Ready  Technician  (MRT)  training  programs.  As  the  name  implies,  the 
MRT  program  is  designed  to  train  and  prepare  a  technician  to  be  proficient  in  job 
tasks  on  the  first  day  of  assignment.  A  number  of  transport,  fighter,  and  attack  air- 


EVALUATION  OF  THE  AIR  FORCE  MRT  PROGRAM  6 1 


craft  training  programs  have  adopted  the  MRT  approach  (e.g.,  C-130,  C-141,  F-15, 
F-16,  A- 10,  respectively).  MRT  training  programs  are  typically  longer  than  tradi¬ 
tional  technical  training  and  involve  training  on  actual  operational  aircraft. 


RESEARCH  OBJECTIVE 

The  goal  in  this  article  is  not  to  report  the  effectiveness  results  of  the  MRT  pro¬ 
gram.  Rather,  the  goal  is  to  explore  the  training  criterion  space  that  was  set  up  as 
part  of  the  effort  to  evaluate  these  programs.  By  training  criterion  space,  we  mean 
the  criteria  considered  as  a  whole,  in  terms  of  fundamental  structure  and  utility. 
Consequently,  we  took  an  extended  correlational  analysis  approach,  using  correla¬ 
tions,  factor  analysis,  and  regression. 

The  training  criterion  space  that  we  examine  here  is  based  on  ratings  of  percep¬ 
tions  and  performance  from  multiple  sources.  Naturally,  no  one  study  can  include 
all  conceivable  measures  that  might  be  of  interest  in  examining  the  effectiveness  of 
training.  Our  objective,  however,  is  to  examine  task  ratings  in  numbers  sufficient  to 
allow  us  to  examine  the  underlying  structure,  of  which  those  ratings  are  the  visible 
representation. 


LITERATURE  REVIEW  ON  TASK  RATINGS 

In  much  of  the  literature  in  the  area  of  job  analysis,  human  factors,  and  industrial  or 
organizational  psychology,  a  task  survey  is  a  tool  for  understanding  which  tasks  in 
a  job  are  most  important  for  successful  job  performance,  which  are  most  fre¬ 
quently  performed,  and  which  are  most  difficult.  In  the  MRT  evaluation  project, 
however,  although  frequency  of  task  performance  was  assessed,  a  number  of  scales 
unusual  for  standard  task  surveys  were  also  incorporated.  These  included,  for  ex¬ 
ample,  task  performance,  number  of  times  performed  in  training  or  on  the  job,  and 
task  confidence  in  training  and  on  the  job.  One  goal  in  including  these  variables 
was  to  begin  an  examination  of  an  expanded  criterion  space  for  training  evaluation. 

General  Discussion  of  the  Concept  and  Use  of  Task  Rating 
Descriptions 

Task  analysis  is  an  important  technique  for  anyone  wishing  to  understand  work. 
Research  into  task  analysis  has  taken  many  different  forms.  Much  of  the  research 
addresses  how  to  write  or  categorize  actual  task  statements.  As  one  example,  C.  A. 
Bennett  (1971)  found  that  tasks  could  be  sorted  into  cognitive,  social,  procedural, 
and  physical  categories.  There  are  many  other  examples,  and  this  stream  of  task 
analysis  research  has  been  extensively  detailed  in  Fleishman  and  Quaintance 
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(1984).  But  task  analysis  can  be  used  not  only  to  understand  work  but  also  to  assist 
in  the  evaluation  of  training.  This  is  true  because  the  assessment  of  the  quality  of 
performance  at  the  task  level,  and  various  perceptions  of  tasks  (such  as  difficulty 
and  frequency),  can  be  analyzed  in  terms  of  how  effective  the  training  is  and  how 
the  training  may  be  improved.  In  fact,  the  data  gathered  in  this  study  did  result  in 
concrete  training  improvements,  as  mentioned  later. 

Research  in  the  area  of  the  actual  ratings  that  are  made  for  various  tasks  has 
been  limited.  That  is,  research  is  limited  on  whether  and  why  tasks  are  to  be  rated 
in  terms  of  frequency  of  performance,  importance  of  performance,  difficulty  of 
performance,  and  so  forth.  It  can  be  argued  that  this  is,  in  part,  because  job  analysis 
and  task  analysis,  two  areas  where  task  ratings  play  a  central  role,  have  tradition¬ 
ally  been  largely  atheoretical  (aside  from  the  inductive  taxonomic  work  discussed 
by  Fleishman  &  Quaintance,  1984). 

Task  ratings  are  obtained  by  asking  job  incumbents  or  supervisors  to  judge 
some  task  dimension  directly.  That  is,  the  instructions  may  simply  tell  the  in¬ 
cumbent  to  rate  task  “importance,”  “difficulty,”  or  “frequency.”  Usually  an  an¬ 
chored  scale  will  be  used  (e.g.,  very  unimportant  to  very  important;  Bemardin, 
1988).  Or  task  scales  may  use  anchors  that  are  somewhat  derivative — as  in  using 
crucial  as  the  highest  anchor  on  a  task-importance  scale  (Drauden,  1988),  or  im¬ 
possible  as  the  highest  anchor  on  a  difficulty  scale  (Beatty,  Coleman,  & 
Schneier,  1988). 

It  is  remarkable  how  little  we  know  about  how  people  interpret  and  use  task-rat¬ 
ing  scales.  The  scales  that  have  traditionally  been  used  are  restricted  in  number, 
and  there  is  little  to  guide  the  practitioner  in  terms  of  what  scales  to  use  for  a  given 
purpose  (Fiegelson  &  Alliger,  1998) 

For  example,  one  problem  that  has  received  limited  attention  is  the  issue  of  re¬ 
dundancy  among  task-rating  scales.  Another  problem  relates  to  whether  different 
sources  of  task  ratings  provide  the  same  or  different  results.  A  brief  review  of  re¬ 
search  relating  to  these  problems  is  presented  as  follows. 

Task  Dimension  Redundancy  Research 

Sanchez  and  Levine  (1989)  found  that  task  importance  and  criticality  ratings  may 
provide  redundant  information  (mean  r  across  four  different  jobs  =  .82),  but  ratings 
of  relative  time  spent  and  task  responsibility  provide  unique  information.  Sanchez 
and  Fraser  (1992)  found  that  the  type  of  job  moderated  some  scale  relations.  For 
example,  difficulty  of  learning  for  reference  librarians  was  highly  related  to  di¬ 
mensions  of  importance  and  time  spent  and  much  less  related  for  cruise  line  repre¬ 
sentatives.  Ranges,  Yost,  and  Cox  (1991)  collected  ratings  of  task  frequency, 
difficulty,  and  importance  and  found  a  high  correlation  between  frequency  and  im¬ 
portance  (mean  r  across  raters  =  .86),  suggesting  redundancy.  Difficulty  ratings, 
however,  appeared  to  provide  some  unique  information  in  their  study  (mean  rs 
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with  frequency  and  importance  =  .54  and  .50,  respectively).  Bemardin  (1988) 
found  a  moderate  correlation  between  time  spent  and  importance  (r  =.42,  N=370). 
Reilly  and  Israelski  (1988),  on  the  other  hand,  found  a  negligible  correlation  be¬ 
tween  task  frequency  and  importance  (r  =.03,  A^=  1 19)  but  found  more  substantial 
relations  between  frequency  and  difficulty  and  importance  and  difficulty  (rs  =  -.53 
and  .30,  respectively).  Using  a  very  large  sample  of  computer  programmers, 
Alliger,  Feinzig,  Wong,  and  Douglas  (1992)  reported  that  ratings  of  task  frequency 
and  importance  correlate  substantially  (about  r=  .60),  but  ratings  of  task  difficulty 
to  learn  are  virtually  independent  of  importance  and  frequency. 

Taken  as  a  whole,  the  existing  research  on  redundancy  among  task-rating  di¬ 
mensions  has  yielded  mixed  results.  There  is  certainly  some  evidence  for  redun¬ 
dancy  of  importance  and  frequency  ratings,  but  it  is  unclear  to  what  extent  this  can 
be  generalized. 

Task  Ratings  and  Rating  Source 

Some  time  ago,  it  was  in  vogue  for  researchers  to  examine  “source  ef¬ 
fects” — whether  and  how  task  ratings  are  affected  by  who  does  the  rating.  For 
example,  a  typical  research  study  might  have  examined  whether  manager  ratings 
of  how  often  job  incumbents  performed  certain  tasks  differed  from  how  often 
the  incumbents  themselves  rated  the  tasks  as  being  performed.  One  main  reason 
for  this  research  was  to  arrive  at  practical  suggestions  on  who  should  complete 
task  ratings. 

Research  has  explored  ratings  made  by  high  versus  low  performers  (Conley  & 
Sacked,  1987;  Wexley  &  Silverman,  1978),  individuals  in  various  job  functions 
(Dowell  &  Wexley,  1978;  Schmitt  &  Cohen,  1989),  respondents  at  different  job 
levels  (Cornelius,  1980;  Smith  &  Hakel,  1979),  different  genders  (Arvey,  Davis, 
McGowen,  &  Dipboye,  1982;  Arvey,  Passino,  &  Lounsbury,  1977;  Ferris,  Fedor, 
Rowland,  &  Porac,  1985;  Schmitt  &  Cohen,  1989),  different  races  (Schmitt  &  Co¬ 
hen,  1989),  and  experts  versus  nonexperts  (Cornelius,  DeNisi,  &  Blencoe,  1984). 
Results  from  these  studies  have  suggested  that  ratings  of  task  characteristics  can 
vary  depending  on  the  source.  In  this  study,  we  gathered  ratings  from  supervisors 
and  trainers  as  well  as  trainees. 

Task  Ratings  Used  in  This  Research 

Given  the  background  provided  previously.  Table  1  summarizes  the  criteria  used  in 
this  research,  categorized  by  several  different  major  areas,  specifically  task  perfor¬ 
mance,  task  demand,  task  confidence  and  task  frequency,  and  timing.  These  crite¬ 
ria  were  gathered  from  a  variety  of  sources,  specifically  MRT  trainee,  MRT  gradu¬ 
ate,  instructor,  and  supervisor.  As  presented  previously,  there  is  strong  empirical 
and  theoretical  justification  for  choosing  these  areas  to  examine. 
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TABLE  1 

Task  Measures  by  Venue,  Source,  and  Operationalization 


Task  Criterion 

Measure  Area 

Venue 

Source 

How  Operationalized 

Frequency  and  timing 
of  task  performance 

Schoolhouse 

MRT  trainee 

Number  of  times  you  observed  others 
performing  the  task 

Schoolhouse 

MRT  trainee 

Number  of  times  task  was  performed  hands-on 

Schoolhouse 

Instructor 

Number  of  times  tested 

Schoolhouse 

MRT  trainee 

Recommended  number  of  times  to  perform 
task  in  training  (F- 1 6  only) 

Field 

MRT  graduate 

Number  of  times  the  task  is  performed  per 
month 

Task  confidence 

Schoolhouse 

MRT  trainee 

Confidence  in  ability  to  perform  task  (1  to  5 
scale)  (F-16  only) 

Field 

MRT  graduate 

Confidence  that  you  can  perform  each  task 
correctly  the  first  time  (1  to  5  scale) 

Task  performance 

Schoolhouse 

Instructor 

How  much  of  the  task  can  the  trainee  perform 
(0  =  not  performed,  5  =  can  do  complete  task) 

Field 

Supervisor 

Percentage  performing  task 

Field 

Supervisor 

How  much  of  the  task  can  the  MRT  perform 
(0  =  not  performed,  5  =  can  do  complete  task) 

Task  demand 

Schoolhouse 

MRT  trainee 

Difficulty  of  task  to  learn  (C-141  only) 

Schoolhouse 

MRT  trainee 

Difficulty  to  perform  (C-141  only) 

Field 

MRT  graduate 

Time,  in  hours,  to  complete  the  task  once 
without  interruptions  or  delays 

Field 

Supervisor 

Percentage  performing  below  standard 

Field 

MRT  graduate 

Month  on  the  job  task  first  performed 

Field 

Supervisor 

Number  of  times  performed  before  the  airman 
could  perform  the  task  without  supervision 

Note.  With  two  exceptions,  the  same  information  was  gathered  from  both  F-16  and  C-141  personnel.  Differ¬ 
ences  in  information  gathered  were  out  of  the  control  of  the  researchers.  First,  “difficulty  to  learn”  information  gath¬ 
ered  from  C-141  personnel  was  dropped  in  F- 1 6  data  collection  and  replaced  by  “recommended  number  of  times  to 
perform  the  task  in  training.”  Second,  “difficulty  to  perform”  information  gathered  from  C-141  personnel  was 
dropped  in  the  F- 1 6  data  collection  and  replaced  by  “confidence  in  training,”  MRT  =  Mission  Ready  Technician. 


THE  EVALUATION  EFFORT; 

DESCRIPTION  AND  METHOD 

This  research  represented  a  comprehensive  formative  and  summative  evaluation  of 
the  innovative  concept  in  technical  training  ctilled  the  MRT  training  program.  This 
program  was  developed  by  the  Air  Force  Air  Education  and  Training  Command 
(AETC)  to  train  entry-level  aircraft  maintenance  technicians  on  key  tasks  associ¬ 
ated  with  job  performance  in  their  first  6  months  on  the  job.  This  certification  train¬ 
ing  program  is  quite  different  from  the  “traditional”  technical  paradigm  to  the  ex¬ 
tent  that  training  for  certain  job-relevant  tasks  should  permit  the  graduates  to 
perform  some  tasks  without  additional  on-the-job  training  at  their  first  job. 
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Task  selection  for  MRT  course  content  was  accomplished  by  examining  the 
previous  course  content  and  the  content  of  the  Major  Command  (MAJCOM)  qual¬ 
ifications  courses,  and  through  consultation  with  technical  school  course  person¬ 
nel,  MAJCOM  representatives,  and  the  career-field  functional  manager.  In  the 
case  of  the  C- 141  and  the  F-16,  the  tasks  to  be  trained  were  already  specified  when 
the  evaluation  study  was  initiated.  The  tasks  in  the  C-141  and  F-16  questionnaires 
represented  the  tasks  that  currently  are  certified  in  all  phases  of  the  courses. 

Thus,  the  MRT  evaluation  program  had  several  objectives;  to  identify  training 
needs  and  thereby  promote  continuous  improvement  of  training  courses  through 
interpretation  of  field  data;  to  develop  routine  benchmark  measures  that  could  be 
used  for  comparing  training  programs,  one  to  the  other;  to  drive  critical  business 
decisions  (such  as  new  course  development  or  location  of  training — schoolhouse 
vs.  field)  by  data;  to  assess  course  and  field  performance  to  gain  a  better  under¬ 
standing  of  the  readiness  of  new  technicians;  and,  finally,  to  grasp  in  more  detail 
the  nature  and  interrelations  of  the  criteria  chosen. 

The  evaluation  effort  involved  the  collection  of  both  in-course  (schoolhouse 
technical  training)  and  field  data  from  trainees,  instructors,  and  supervisors  in  the 
F-16  and  C-141  aircraft.  Questionnaires  that  listed  each  task  were  distributed,  and 
respondents  provided  necessary  information  as  categorized  in  Table  1 . 

Information  was  collected  in  two  studies.  Study  1  was  condueted  with  the 
C-141  aircraft  (Af  =  177  trainees);  Study  2,  with  the  F-16  aireraft  (A^  =  1 10  train¬ 
ees).  It  is  important  to  note  that,  due  to  factors  outside  the  researchers’  control, 
some  of  the  data  collection  differed  between  the  two  aircraft.  Differences  are  noted 
in  the  results  sections  where  appropriate.  The  goal  in  conducting  two  separate 
studies  was  to  provide  evidence  of  consistent,  stable  results  across  aircraft. 

The  job  of  the  MRT  is  to  prepare  and  repair  an  aircraft,  including  engines.  The 
C-141  is  a  large,  four-engine  transport,  and  the  F-16  is  a  single-engine  fighter.  In 
the  case  of  the  C-141  trainees,  there  were  107  tasks  to  be  rated;  95  were  rated  for 
the  F- 1 6.  The  median  time  since  training  (for  the  field  ratings)  was  6.5  months  for 
the  F-16  trainees  and  8.5  months  for  the  C-141  trainees.  Over  50%  of  each  sample 
worked  the  day  shift,  with  25%  (F-16)  and  35%  (C-141)  on  the  swing  shift;  the  re¬ 
mainder  of  each  group  worked  nights  or  “other.” 

Data  was  collected  via  questionnaire,  either  handed  out  (in  the  schoolhouse)  or 
mailed  (in  the  field).  Questionnaires  included  instructions  on  completion  and  return. 


STUDY  1  RESULTS: 

C-141  AIRCRAFT 

Both  studies  reported  task  means,  intercorrelations,  regression  analyses,  and  ex¬ 
ploratory  factor  analyses.  Regression  analyses  were  carried  out  via  standard  hier¬ 
archical  linear  regression;  the  factor  analyses  were  carried  out  via  the  method  of 
principal  components,  using  varimax  rotation.  The  rationale  for  these  methods  was 
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TABLE  2 

Mean  Task  Ratings  Identified  by  Source  (C-141) 


Task  Number 


Data  Source 

I 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Schoolhouse 

MRT  instructor 

Task  performance 

M 

4.70 

4.27 

4.44 

4.67 

4.80 

3.73 

3.70 

3.95 

4.56 

4.52 

SD 

0.61 

0.84 

0.75 

0.62 

0.41 

0.86 

0.81 

0.72 

0.64 

0.71 

No.  of  times  tested 

M 

5.50 

1.19 

1.11 

6.19 

7.36 

1.39 

1.33 

1.75 

3.25 

3.86 

SD 

7.69 

0.53 

0.40 

8.93 

9.82 

0.52 

0.51 

1.36 

3.40 

6.27 

MRT  student 

No.  of  times  performed 
hands-on 

M 

39.63 

4.63 

5.62 

18.19 

20.19 

7.79 

2.34 

22.41 

17.43 

20.62 

SD 

18.97 

5.58 

6.30 

14.34 

14.38 

12.06 

5.50 

18.15 

12.23 

13.61 

No.  of  times  observed 

M 

28.90 

5.82 

5.63 

14.08 

15.23 

6.82 

2.89 

15.26 

13.64 

19.00 

SD 

20.15 

7.47 

6.75 

12.35 

13.38 

10.34 

4.61 

14.59 

11.14 

13.63 

Difficulty  to  learn 

M 

1.04 

1.12 

1.08 

1.01 

1.05 

1.58 

1.65 

1.52 

1.05 

1.03 

SD 

0.25 

0.36 

0.27 

0.11 

0.27 

0.70 

0.70 

0.75 

0.27 

0.16 

Difficulty  to  perform 

M 

1.05 

1.17 

1.11 

1.01 

1.06 

1.55 

1.63 

1.51 

1.08 

1.09 

SD 

0.27 

0.44 

0.32 

0.11 

0.34 

0.68 

0.76 

0.73 

0.35 

0.33 

Field 

MRT  supervisor 
%  performing  task 

96.6 

96.4 

93.1 

96.6 

93.1 

78.6 

71.4 

100.0 

100.0 

96.6 

Task  performance 

M 

4.69 

4.25 

4.21 

4.83 

4.62 

2.50 

2.21 

3.45 

4.69 

4,52 

SD 

1.00 

1.46 

1.52 

0.93 

1.29 

1.88 

1.91 

1.33 

0.71 

1.12 

%  performing  below 
standard 

3.45 

17.86 

13.79 

3.45 

6.90 

50.00 

53.57 

17.24 

0.00 

3.45 

No.  of  times  performed  until 
no  supervision 

M 

1.11 

1.00 

0.97 

1.00 

0.96 

0.92 

0.96 

1.00 

1.07 

1.00 

SD 

0.32 

0.27 

0.33 

0.00 

0.19 

0.40 

0.37 

0.00 

0.26 

0.00 

MRT  graduate 

Month  task  first  performed 

M 

1.00 

1.56 

1.13 

1.00 

1.00 

1.00 

1.75 

1.31 

1.06 

1.19 

SD 

1.00 

0.00 

1.15 

0.81 

0.00 

0.00 

0.63 

1.57 

0.95 

0.25 

No.  of  times  task  performed 
per  month 

M 

36.40 

12.15 

7.87 

34.75 

29.25 

35.93 

7.93 

24.00 

22,06 

21.19 

SD 

23.48 

12.58 

6.93 

25.07 

17.45 

65.52 

12.70 

21.46 

13.96 

16.27 

(continued) 
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TABLE  2  (Continued) 


Task  Number 

Data  Source 

I 

2 

3 

4 

5 

6 

7 

8 

9 

to 

Time  (in  hr)  required  to 
perform  task 

M 

0.72 

0.50 

0.52 

0.54 

0.58 

1.10 

0.92 

1.13 

0.57 

0.52 

SD 

1.26 

1.30 

1.27 

1.31 

1.36 

1.98 

2.07 

1.85 

1.27 

1.32 

Confidence 

M 

4.94 

4.57 

4.52 

4.93 

4.90 

3.44 

3.26 

4.16 

4.85 

4.89 

SD 

0.23 

0.69 

0.88 

0.28 

0.36 

1.17 

1.07 

0.85 

0.55 

0.48 

Note.  Task  names  are  as  follows:  I  =  Foreign  object  debris  prevention;  2  =  Inspect  and  operate  H- 1  heater;  3  = 
Inspect  and  operate  NF-2;  4  =  Statically  ground  aircraft;  5  =  Inspect  and  position  ground  fire  extinguisher;  6  = 
Maintenance  data  collection;  7  =  Order  and  turn  parts;  8  =  Maintain  aircraft  records;  9  =  Open/close  troop  door;  1 0 
=  Open/close  crew  entrance  door.  MRT  =  Mission  Ready  Technician. 


that  our  primary  goal  was  to  understand  the  structure  of  the  criterion  space;  in 
practice,  this  means  studying  the  interrelation  among  criteria  in  a  number  of  dif¬ 
ferent  ways. 


Mean  Task  Ratings 

Table  2  provides  a  portion  of  the  107  C- 141  mean  task  ratings  for  the  various  task 
scales  and  identifies  the  data  source,  both  in  terms  of  location  (schoolhouse  or 
field)  and  source  (instructor,  student,  supervisor,  graduate).  Similar  task  ratings 
were  undertaken  for  the  F-16  aircraft.  Tables  such  as  these  were  one  of  the  most 
practical  outcomes  of  this  research.  Tasks  perceived  as  difficult  to  perform  or  learn 
in  the  schoolhouse  could  be  identified  and  the  training  analyzed  for  possible  im¬ 
provements.  Tasks  that  were  not  performed  frequently  in  the  field  could  be  identi¬ 
fied,  and  the  emphasis  on  those  tasks  in  schoolhouse  training  could  be  decreased. 
Similarly,  schoolhouse  training  could  be  targeted  for  examination  of  those  tasks 
for  which  field  performance  was  not  considered  sufficient  (either  lower  than  ac¬ 
ceptable  mean  performance  or  lower  than  acceptable  percentage  of  graduates  per¬ 
forming  below  standard). 

Correlations 

Reported  correlations  are  not  based  on  107  individuals  but  on  the  mean  ratings  of 
107  tasks.  Therefore,  each  of  the  107  data  points  is  not  subject  to  the  same  degree 
of  error  as  in  the  usual  case  of  correlations  across  individuals.  Hence,  the  resulting 
correlations  are  likely  to  be  more  stable  than  is  typically  the  case;  for  this  reason, 
traditional  tests  of  statistical  significance  do  not  apply. 
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Correlations  among  frequency  of  performance  ratings.  As  pointed  out 
previously,  frequency  of  task  performance  and  task  importance  are  the  two  dimen¬ 
sions  most  commonly  designed  into  task  analysis  questionnaires.  In  the  school- 
house,  frequency  of  task  performance  was  estimated  by  students  in  terms  of  the 
number  of  times  the  task  was  performed  in  that  environment  and  the  number  of 
times  the  student  observed  others  perform  the  task.  In  the  field,  frequency  was  esti¬ 
mated  by  the  number  of  times  the  task  was  performed  in  the  field  per  month  (grad¬ 
uate  ratings).  The  schoolhouse  frequency  ratings  converged  with  the  field  per 
month  ratings  (rs  =  .38  and  .40,  respectively). 

Correlations  involving  task  difficulty.  Students  in  the  schoolhouse  rated  dif¬ 
ficulty  to  learn  task  and  difficulty  to  perform.  These  two  measures  correlated  posi¬ 
tively,  as  expected  (r  =  .88).  There  are  a  number  of  relations  that  theoretically  can  be 
predicted  to  have  a  negative  relation  with  these  difficulty  measures.  Task  difficulty 
and  performance  difficulty  correlated  negatively  with  task  performance  in  the 
schoolhouse  as  rated  by  the  supervisor  (rs  =  -.47  and  -.45,  respectively).  Similarly, 
the  difficulty  ratings  correlated  negatively  with  supervisor  field  performance  ratings 
('^  =  -•19  and  -.20,  respectively),  and  with  graduate  field  confidence  (rs  =  -.2 1  and 
-.18,  respectively). 

Correlation  between  confidence  and  performance.  Confidence  and  per¬ 
formance  in  the  field  are  related  (r  =  .68).  More  will  be  said  about  the  relation  be¬ 
tween  confidence  and  performance  in  the  discussion  on  the  results  for  the  F- 16  and 
in  the  general  discussion. 

Exploratory  Factor  Analyses 

We  expected  four  factors  to  emerge  from  our  exploratory  factor  analysis,  mirroring 
the  four  a  priori  groupings  of  variables:  task  frequency  and  timing,  task  confi¬ 
dence,  task  performance,  and  task  demand.  That  is,  a  rational  grouping  of  variables 
suggested  that  there  might  be  four  underlying  factors.  Results  of  an  exploratory 
factor  analysis  are  somewhat  consistent  with  this  prediction,  with  several  excep¬ 
tions.  First,  information  from  the  field  (task  confidence  and  task  performance) 
seemed  to  group  together  (33%  of  total  variance).  Task  demand  seemed  to  split 
into  two  factors:  task  difficulty  (14%  of  total  variance)  and  task  demand  (9%  of  to¬ 
tal  variance).  Task  frequency  and  timing  accounted  for  19%  of  the  total  variance  in 
the  factor  analysis. 

Regression  Analyses 

An  initial  regression  analysis  explored  the  impact  of  the  schoolhouse  data  on  grad¬ 
uate  performance.  The  multiple  correlation  of  prediction  of  field  task  performance, 
using  only  schoolhouse  variables,  is  maximized  at  about  .28.  However,  if  field 
“predictor”  variables  are  included  (but  excluding  any  variables  for  which  data  are 


EVALUATION  OF  THE  AIR  FORCE  MRT  PROGRAM  69 


TABLE  3 

Regression  Providing  Maximal  “Prediction”  of  Task  Performance 
in  the  Field  (C-141) 


Variable 

Standardized  Regression  Coefficients 

R^ 

Block  1 :  Frequency  and  timing  variables 

.38 

38 

Student:  Mean  number  of  times 

observed  in  schoolhouse 

-.44  -.61 

-.55 

-.62 

Student:  Mean  number  of  times 

performed  in  schoolhouse 

..34  .53 

.47 

.54 

Graduates:  Mean  number  of  times 

task  performed  in  field  per  month 

.63***  .43*** 

47*** 

Instructor:  Mean  number  of  times 

tested  in  schoolhouse 

.04  -.04 

-.05 

-.02 

Block  2:  Confidence  variables 

.22 

60 

Graduates:  Mean  confidence  in  field 

.53*** 

.51*** 

47*** 

Block  3:  Difficulty  variables 

.01 

61 

Student:  Mean  difficulty  to  leam  in 

schoolhouse 

-.02 

-.09 

Student:  Mean  difficulty  to  perform 

in  schoolhouse 

-.08 

-.04 

Block  4:  Demand  variables 

.02 

63 

Graduates:  Mean  time,  in  hours. 

required  to  perform  task  in  field 

-.08 

Graduates:  Mean  month  task  first 

performed  in  field 

.11* 

Supervisor:  Mean  number  of  times 

performed  in  field  until  no 

supervision 

.10* 

Note.  The  dependent  variable  is  Supervisor:  Mean  task  performance  in  field.  All  effect  sizes 
greater  than  .20  are  underlined. 

*p  <  .\0.  ***p  <  .01. 


provided  by  the  supervisor,  the  source  of  the  criterion  ratings),  multiple  correlation 
can  be  increased  to  a  substantial  .80  (R^  =  .63).  The  statistics  for  these  analyses  are 
found  in  Table  3. 

Table  3  indicates  that,  in  addition  to  the  schoolhouse  variables,  two  other  major 
variables  add  to  prediction  in  this  sample:  the  number  of  times  a  task  is  reported  by 
the  graduate  as  being  performed  in  the  field  and  graduate  task  confidence  in  the  field. 


SUMMARY  OF  STUDY  1 : 

C-141  AIRCRAFT 

In  summary,  many  of  the  correlations  among  key  variables  were  in  the  expected  di¬ 
rection  for  this  investigation  of  the  C-141  MRT  training  evaluation  criteria. 
Schoolhouse  factors  predicted  graduate  job  performance  to  some  degree,  but  this 
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prediction  was  significantly  enhanced  when  information  from  the  field  was  added 
to  the  regression  analysis.  Exploratory  factor  analysis  showed  that,  although  the 
loadings  of  variables  onto  factors  was  not  perfect,  they  did  tend  to  mirror  our  four  a 
pnon  factors.  These  initial  results  suggest  that  the  criteria  of  task  frequency  and 
timing,  task  confidence,  task  performance,  and  task  demand  is  a  worthwhile  way 
of  characterizing  the  criterion  space. 


STUDY  2  RESULTS: 

F-1 6  AIRCRAFT 

Specific  Criteria  Collected  for  F-1 6  Aircraft 

Consistent  with  the  model  presented  in  Table  1,  information  was  gathered  from  the 
schoolhouse  and  field,  from  MRT  trainees,  instmctors,  supervisors,  and  MRT 
graduates  in  the  areas  of  frequency  and  timing,  task  confidence,  task  performance, 
and  task  demand.  The  purpose  of  collecting  information  from  this  second  aircraft 
was  to  provide  evidence  of  consistent  results. 


Correlations 

Again,  similar  to  the  C-141  aircraft,  correlations  are  not  based  on  95  individuals 
but  on  the  mean  ratings  of  95  tasks.  Therefore,  each  individual  of  the  95  data  points 
is  not  subject  to  the  same  degree  of  error  as  in  the  usual  case  of  correlations  across 
individuals. 

Correlations  among  frequency  of  performance  ratings.  As  pointed  out 
previously,  frequency  of  task  performance  and  task  importance  are  the  two  dimen¬ 
sions  that  are  most  commonly  designed  into  task  analysis  questionnaires.  The 
schoolhouse  frequency  ratings  converged  with  the  field  per  month  ratings  (rs  =  .56 
and  .75,  respectively). 

Correlations  between  confidence  and  performance.  Some  of  the  most 
interesting  correlations  involve  MRT  confidence.  Confidence  in  the  schoolhouse  is 
related  to  performance  in  that  environment  (r=  .85).  Similarly,  confidence  and  per¬ 
formance  in  the  field  are  related  (r=  .86).  Confidence  in  the  schoolhouse  is  related 
to  confidence  in  the  field  (r  =  .40).  The  fact  that  confidence  relates  to  performance 
within  the  same  situation  (schoolhouse,  field)  supports  Bandura’s  (1977)  conten¬ 
tion  that  efficacy-type  variables  are  situation  specific  in  their  predictive  abilities. 
The  fact  that  confidence  is  so  highly  related  to  performance  has  practical  implica¬ 
tions  for  training. 
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Exploratory  Factor  Analyses 

We  expected  four  factors  to  emerge  from  our  exploratory  factor  analysis,  mirroring 
the  four  groupings  of  variables:  task  frequency  and  timing,  task  confidence,  task 
performance,  and  task  demand.  Results  of  an  exploratory  factor  analysis  are  some¬ 
what  consistent  with  this  prediction,  with  several  exceptions.  Task  frequency  and 
timing  and  task  demand  were  two  clear  factors  in  the  factor  analysis,  accounting 
for  15%  and  11%,  respectively,  of  the  total  variance.  Task  confidence  in  the 
schoolhouse  loaded  with  instructor  ratings  of  performance  ( 1 2%  of  variance),  and 
confidence  in  the  field  loaded  with  other  field  performance  measures  (38%  of  the 
variance).  This  indicates  a  blending  of  our  criteria  with  venues. 


Regression  Analyses 

An  initial  regression  analysis  explored  the  impact  of  the  schoolhouse  data  on  grad¬ 
uate  performance.  The  multiple  correlation  of  prediction  of  field  task  performance, 
using  only  schoolhouse  variables,  is  maximized  at  about  .51.  However,  if  field 
“predictor”  variables  are  included  (but  excluding  any  variables  for  which  data  are 
provided  by  the  supervisor,  the  source  of  the  criterion  ratings),  multiple  correlation 
can  be  increased  to  a  substantial  .88  (R^  -  .77).  The  statistics  for  this  regression  are 
found  in  Table  4. 


SUMMARY  OF  STUDY  2: 

F-1 6  AIRCRAFT 

Again,  for  this  aircraft,  many  of  the  correlations  among  key  variables  were  in  the 
expected  direction.  Schoolhouse  factors  predicted  graduate  job  performance  to 
some  degree,  but  this  prediction  was  significantly  enhanced  when  information 
from  the  field  was  added  to  the  regression  analysis.  Exploratory  factor  analysis 
showed  that,  although  the  loadings  of  variables  onto  factors  were  not  perfect,  they 
did  tend  to  mirror  our  four  factors.  These  initial  results  again  support  the  idea  that 
criteria  of  task  frequency  and  timing,  task  confidence,  task  performance,  and  task 
demand  is  a  useful  way  of  thinking  about  the  criterion  space. 


DISCUSSION 

One  major  goal  of  the  MRT  project  was  eminently  practical:  to  provide  informa¬ 
tive  feedback  about  how  MRT  trainees  are  doing,  both  in  the  schoolhouse  and  in 
the  field,  to  instructional  designers,  trainers,  and  those  charged  with  supervision  of 
training  and  design.  Other  briefings  and  documents  have  addressed  these  issues. 
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TABLE  4 

Regression  Providing  Maximal  “Prediction”  of  Task  Performance 
in  the  Field  (F-16) 


Standardized  Regression 

Variable  Coefficients  AR^  R^ 


Block  1 :  Frequency  and  timing  variables 

.65 

.65 

Instructor:  Mean  number  of  times  tested 

in  schoolhouse 

.08 

.01 

.03 

Student:  Mean  number  of  times 
performed  in  schoolhouse 

Student:  Mean  number  of  times  observed 

2'j** 

.01 

.00 

in  schoolhouse 

-.06 

-.12 

-.24 

Student:  Recommended  number  of  times 
performed  in  training 

Graduates:  Mean  number  of  times  task 

-.34 

.03 

.09 

performed  in  field  per  month 

.66*** 

.14 

.22 

Block  2:  Confidence  variables 

.34 

.87 

Student:  Confidence  in  training 

-.13 

-.11 

Graduates:  Mean  confidence  in  field 

.86*** 

Block  3:  Demand  variables 

.02 

.88 

Graduates:  Mean  time,  in  hours,  required 

to  perform  task  in  field 

-.03 

Graduates:  Mean  month  task  first 
performed  in  field 

Supervisor:  Mean  number  of  times 

-.13 

performed  in  field  until  no  supervision 

-.02 

Note.  The  dependent  variable  is  Supervisor:  Mean  task  performance  in  field.  All  effect  sizes 
greater  than  .20  are  underlined. 

**p  <  .05.  ***p  <  .01. 


However,  the  interested  reader  has  only  to  examine  Table  2  to  see  how  mean  task 
ratings  on  a  variety  of  scales,  derived  from  a  variety  of  sources  and  from  different 
venues,  can  be  a  valuable  diagnostic  tool  for  a  training  initiative.  What  tasks  are 
not  performed  well?  Which  are  the  hardest  to  learn?  On  which  tasks  do  trainees 
have  the  least  confidence  that  they  can  perform  well?  These  kinds  of  questions,  an¬ 
swered  by  clear  schoolhouse  and  field  data,  can  inform  revision  of  training  or  the 
design  of  new  training.  Thus,  the  practical  goal  of  this  MRT  training  evaluation  ef¬ 
fort,  although  formally  outside  the  scope  of  this  article,  can  be  appreciated  from  a 
review  of  our  results. 

From  a  technical  research  point  of  view,  the  goals  of  this  study  were  severalfold. 
For  many  years,  researchers  have  been  trying  to  identify  underlying  taxonomies  of 
human  work  performance.  Increasingly,  too,  efforts  have  been  made  to  clarify  the 
training  criterion  space.  As  the  literature  review  showed,  these  are  conceptually  re¬ 
lated  efforts.  Some  training  criteria,  particularly  on-the-job  measures,  are  simply 
ways  to  assess  work  performance. 


EVALUATION  OF  THE  AIR  FORCE  MRT  PROGRAM  73 


One  feature  of  the  MRT  effort  is  to  advance  our  understanding  of  the  training 
criteria  space:  how  training  criteria  relate  and  how  measures  converge  and  diverge. 
This  convergence-divergence,  or  underlying  structure,  is  an  important  issue  be¬ 
cause  it  addresses  which  measures  offer  unique  variance  to  the  criterion  space  and 
which  are  more  or  less  overlapping.  For  the  researcher  who  wants  to  be  able  to  tell 
practitioners  what  to  measure,  this  type  of  independence  analysis  can  provide 
practical  guidance. 

The  Structure  of  the  Criterion  Space 

Exploratory  factor  analyses  suggest  a  large  degree  of  replication  between  the  two 
aircraft.  In  each  case,  the  largest  single  factor  was  what  might  be  termed^e/d  per¬ 
formance,  with  supervisor  rating  of  MRT  graduate  performance  receiving  the 
highest  loadings.  Also  loading  highly  on  this  factor  were  p»ercentage  performing 
task  and  task  field  confidence.  The  second  factor  was  composed  of  schoolhouse 
measures  of  frequency  and  timing  of  testing  and  observation.  The  third  factor  was 
either  schoolhouse  task  difficulty  (C- 1 4 1 )  or  task  confidence  (F- 1 6).  Finally,  there 
was  a  field  task  demand  (or  field  task  difficulty). 

Given  that  there  was  some  difference  in  measures  between  the  two  aircraft, 
the  amount  of  similarity  in  structure  is  very  substantial.  We  can  reasonably  sug¬ 
gest  that  field  performance,  training  frequency  and  timing  (or  duration),  diffi¬ 
culty  to  learn  (or  demand),  field  demand  (or  difficulty),  and  confidence  represent 
five  replicable  dimensions  of  task  measures.  Thus,  when  designing  a  training 
evaluation  program,  elements  of  each  of  these  could  be  incorporated  into  the 
measures  used. 

Prediction 

Prediction  of  field  performance  by  schoolhouse  variables.  In  the  case  of 
each  aircraft,  the  strongest  schoolhouse  predictor  of  field  performance  was  the 
number  of  times  a  task  was  reported  to  have  been  performed  in  the  schoolhouse. 
This  may  indicate  that  practice  and  familiarity  of,  and  duration  (or  frequency)  of 
exposure  to,  a  task  can  have  an  impact  on  how  well  that  task  is  performed  later.  In¬ 
deed,  although  we  accept  as  a  truism  that  “practice  makes  perfect,”  this  truism  is 
certainly  supported  here. 

Optimal  prediction  of  field  performance.  Tables  3  and  4  report  hierarchi¬ 
cal  regressions,  using  field  performance  as  the  criterion.  As  noted  previously,  the 
main  structural  difference  between  the  two  tables  is  that  task  difficulty  variables 
were  available  for  C-141.  Squtired  multiple  correlation  in  both  cases  exceeds 
•60 — this  is  very  high,  given  that  common  method  variance  can  be  ruled  out  as  an 
explanation.  If  we  accept  only  those  variables  as  good  predictors  that  have  stan- 
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dardized  regression  weights  above  an  arbitrary  .20  in  both  samples,  then  we  find 
the  predictive  variables  are  number  of  times  observed  in  schoolhouse,  number  of 
times  performed  in  the  field,  and  confidence  in  the  field.  Although  the  first  two 
may  relate  to  practice  (but,  surprisingly,  number  of  times  observed  in  the  school- 
house  receives  a  negative  weight),  it  is  clear  that  confidence  ratings  alone  add  over 
20%  of  unique  variance  accounted  for  over  schoolhouse  variables.  This  relation 
between  confidence  and  performance  deserves  some  discussion. 

The  predictive  power  of  confidence.  The  same  general  relation  between 
performance  and  confidence  holds  in  both  aircraft.  This  relation  is  in  both  cases  a 
remarkably  strong  one  («  =  .68  and  .86,  respectively).  Why  should  confidence 
predict  performance  so  well?  One  obvious  answer  is  found  in  the  work  of  Bandura 
( 1 977, 1984).  He  proposed  a  concept  called  self-efficacy.  Self-efficacy  is  a  focused 
belief  in  the  ability  to  perform  a  specific  task  or  in  a  specific  arena  of  activity. 
Hence,  the  MRT  measures  of  confidence,  because  they  were  precisely  at  the  task 
level,  can  be  considered  measures  of  self-efficacy  in  Bandura’s  sense.  Given  the 
strength  of  prediction  found  in  this  study,  it  is  interesting  to  note  that  Bandura 
(1984)  suggested  that  self-efficacy  will  “usurp  the  lion’s  share  of  the  variance  in 
human  conduct”  (p.  252). 

Implications  for  Training  Evaluation 

There  are  several  implications  for  training  evaluation  that  can  be  derived  from  the 
research  presented  in  this  article.  First,  assessing  training  at  the  task  level  can  result 
in  concrete  improvements  in  training.  Second,  task  difficulty  and  timing-fre¬ 
quency  measures  can  be  a  useful  addition  to  the  evaluator’s  collection  of  training 
evaluation  measures.  Third,  because  source  effects  (e.g.,  supervisor  vs.  trainer  vs. 
trainee)  do  not  obscure  rationale  content  relations  among  measures,  evaluators  can 
with  some  confidence  obtain  measures  from  different  sources  without  worrying 
that  source  effects  will  substantially  impact  the  outcomes.  A  fourth  practical  impli¬ 
cation  may  be  that  in  training,  and  later  in  the  field,  fostering  self-confidence  at  the 
task  level  is  critical. 
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